Idarex: Formal Description of Multi-word Lexemes with Regular Expressions

نویسندگان

  • Giuseppe Valetto
  • Elisabeth Breidt
چکیده

Most multi-word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time express in a general way regularities valid for a whole class of MWLs. The local grammars can be written in a very convenient and compact way as regular expressions in the formalism IDAREX which uses a two-level morphology. IDAREX allows the deenition of various types of variables, and to mix canonical and innected word forms in the regular expressions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formal Description of Multi-Word Lexemes with the Finite-State Formalism IDAREX

Most multi-word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description and their recognition in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time allow to express in a general way regularities valid for a whole class of MWLs. The local grammars can be...

متن کامل

Local Grammars for the Description of Multi{Word Lexemes and their Automatic Recognition in Texts

Most multi{word lexemes (MWLs) allow certain types of variation. This has to be taken into account for their description to be able to recognize them in texts. We suggest to describe their syntactic restrictions and their idiosyncratic peculiarities with local grammar rules, which at the same time permit to express regularities valid for a whole class of MWLs such as word order variation in Ger...

متن کامل

CoCoCo: Online Extraction of Russian Multiword Expressions

In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lexemes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper we introduce a Web interface, which provides the user with access to these measures, to query Russian-language corpora. Potential users of these tools include ...

متن کامل

Derivatives for Enhanced Regular Expressions

Regular languages are closed under a wealth of formal language operators. Incorporating such operators in regular expressions leads to concise language specifications, but the transformation of such enhanced regular expressions to finite automata becomes more involved. We present an approach that enables the direct construction of finite automata from regular expressions enhanced with further o...

متن کامل

One-unambiguity of regular expressions with numeric occurrence indicators

Regular expressions with numeric occurrence indicators are an extension of traditional regular expressions, which let the required minimum and the allowed maximum number of iterations of subexpressions be described with numeric parameters. We consider the problem of testing whether a given regular expression E with numeric occurrence indicators is 1-unambiguous or not. This condition means, inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995